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IN THE SPECIFICATION 

Please replace the paragraph beginning at page 1 5 line 11, with the following replacement 
paragraph: 

In a shared-memory multiprocessor system, it appears to a user that all processors read and 
modify state information in a single shared memory store. A substantial difficulty in implementing 
such a system, and particularly a distributed version of such a system, is propagating values from one 
processor to another, in that the actual values are created close to one processor but might be used 
\ by many other processors in the system. If the implementation could accurately predict the sharing 
patterns of a given program, the processor nodes of a distributed multiprocessor system could spend 
more of their time computing and less of their time waiting for values to be fetched from remote 
locations. Despite the development of processor features such as non-blocking caches and 
out-of-order instruction execution, the relatively long access latency in a distributed shared-memory 
system remains a serious impediment to performance. 

Please replace the paragraph beginning at page 2, line 22, with the following replacement 
paragraph: 

The invention provides improved techniques for determining a set of predicted readers of a 
data block subject to a write request in a shared-memory multiprocessor system. In accordance with 
an aspect of the invention, a current set of readers of the data block are determined, and then the set 
of predicted readers is generated based on the current set of readers and at least one additional set 
of readers representative of at least a portion of a global history of a directory associated with the 
data block. In one possible implementation, the set of predicted readers are is generated by applying 
a function to the current set of readers and one or more additional sets of readers. The function may 
be, for example, a union function, an intersection function or a pattern-based function, and the 
directory and data block may be elements of a memory associated with a particular processor node 
of the multiprocessor system. 
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to 



Please replace the paragraph beginning at page 4, line 20, with the following replacement 
paragraph: 

The invention will be illustrated herein in conjunction with exemplary distributed shared- 
memory multiprocessor systems. It should be understood, however, that the invention is more 
generally applicable to any shared-memory multiprocessor system in which it is desirable to provide 
improved performance through the use of directory-based prediction. The term "multiprocessor 
system" as used herein is intended to include any device in which retrieved instructions are executed 
using one two or more processors. Exemplary processors in accordance with the invention may 
include, for example, microprocessors, central processing units (CPUs), very long instruction word 
(VLIW) processors, single-issue processors, multi-issue processors, digital signal processors, 
application-specific integrated circuits (ASICs), personal computers, mainframe computers, network 
computers, workstations and servers, and other types of data processing devices, as well as portions 
and combinations of these and other devices. 



Please replace the paragraph beginning at page 5, line 3, with the following replacement 
paragraph: 



FIGS. 1 and 2 illustrate the handling of e xample exemplary read and write requests, 
respectively, in a distributed shared-memory multiprocessor system 100. The system 100 is an 
example of one type of system in which the directory-based prediction of the present invention may 
be implemented. The system 100 includes nodes A, B and C, which are connected to an 
interconnection network 102 via corresponding network interfaces (NIs) 104A, 104B and 104C, 
respectively. The nodes A, B and C include processors 106 A, 106B and 106C, memories 108 A, 
108B and 108C, and buses 1 10A, 1 10B and 1 10C, respectively, arranged as shown. Within a given 
node / of the system 100, i = A, B, C, the processor 106/, memory 108/ and network interface 104/ 
are each coupled to and communicate over the corresponding bus 1 10/. 
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Please replace the paragraph beginning at page 6, line 27, with the following replacement 
paragraph: 

In the e xampl e exemplary implementation of the illustrative embodiment to be described in 
conjunction with FIG. 4 below, a history depth of four is used, i.e., the predicted set of readers 
generated for a current write operation on a given block is determined as a function of the current 
set of readers of that block and the three other most recent sets of readers stored in a predictor shift 
register. 

Please replace the paragraph beginning at page 7, line 3, with the following replacement 
paragraph: 



FIG. 4 shows an example of the operation of a directory-based predictor in the illustrative 
embodiment of the invention. In this example, a write request is received for a data block X 
associated with a memory and directory 120. The current readers of the data block X are processors 
in a set of nodes {a, b, c} of a multiprocessor system which includes nodes denoted a, b, c, d, e, f, 
g, h, i, j, k, 1, m, etc. Each of the nodes may represent a node of a multiprocessor system such as that 
illustrated in conjunction with FIGS. 1 and 2. The predictor in this example uses a shift register 122 
in a manner to be described below. 



Please replace the paragraph beginning at page 8, line 7, with the following replacement 
paragraph: 

The choice of union function or intersection function in step 2 1 6 of FIG. 5 generally depends 
on the desired degree of aggressiveness in the data forwarding. For example, in high-bandwidth 
"n systems, the more aggressive data forwarding associated with the union function may be more 

appropriate, while for low-bandwidth systems, the intersection function may be more appropriate. 
It should be noted that these functions are given by way of example only, and the invention can be 
implemented using other types of functions. As another example, pattern-based functions can be 
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used in conjunction with the present invention. Such functions are described in greater detail in, e.g., 
T. Yeh and Y. Patt, "Two-Level Adaptive Branch Prediction," Proceedings of the 24th Annual 
ACM/IEEE International Symposium and Workshop on Microarchitecture, Los Alamitos, CA, 
November 1991, which is incorporated by reference herein. 

Please replace the paragraph beginning at page 12, line 2 1 , with the following replacement 
paragraph: 



Tabl e Tables 2 and 5 shows show the top ten mos t sensitive schemes , in terms of specificity, -~ 
in the set of possible predictors using direct update and forwarded update, respectively . All a r e 
- union schemes wi t h t h e maximum his t ory depth used in this exampl e , i.e., a his t ory dep t h of 4. AH 

schemes are roughly comparable in sensi t ivity, bu t with diffe re nt valu e s of PVP. It is in t eres t ing t o . L 
note t hat by far the least e xp e nsive sch e m e (union(di r I add^*) is fifth - best overall in tcrm3 of 
sensi t ivity. ~~ 



Please replace the paragraph beginning at page 12, line 26, with the following replacement 
paragraph: 

Table Tables 3 and 6 shows show the top ten mos t sensi t ive schemes , in terms of sensitivity, - " 
in the set of possible predictors using direct update and forwarded update , respectively . The re is v er y 
li t tle differ e nc e betwe e n t he direc t - and forwarded-update sch e mes. Six of th e t op t en schemes a r c \ 
common to th e t wo lis t s, and th e s t atis t ics diffe r li tt l e from column to column. All are union 
schemes with the maximum history depth used in this example, i.e., a history depth of 4. 



Please delete the paragraph beginning at page 13, line 1, as follows: 



Tables 5 and 6 show the top t en p re dic t o r s in t he set of p ossible forward e d upda t e predic t ors 
in t e rms of sp e cifici t y and s e nsi t ivi t y, resp e ctively. 
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Please replace the paragraph of the abstract, at page* 19, line 2, with the following rewritten 
paragraph: 



^ ' ^a! set of predicted readers(are determine^ for a data block subject to a write reques^in a 
shared-memory multiprocessor system by first determining a current set of readers of the data block, 
and then generating the set of predicted readers based on the^current set of readers and at least one 
additional set of readers representative of at least a portion of> global history of a directory 
associated with the data block. An one possible implementation, th^/set of predicted readers are^ J 
generated by applying a function^) the current set of readers and one or more additional sets of 
readers. /The function may be, for example, a union function, an in t e r section func t ion or a pattern- 
bas e d function, and th e directory and da t a block may b e e lements of a m e mory associated with a 
particular processor node of t he mul t ip r ocessor sys t em^ The global history ^of t h e ^cc t ory ^ 
eompris^mul t iple s et s of previous re ad e rsfproccssed by t he dir e ctory, wi t h the to t al numb e r of se t s r° 



of previous r e ad e r^corrcsponding to a designa t ed his t ory deptlv^ssociated with genera t ion of t he set 
of predic t ed r e ader^. Th e prediction proc e ss mayj^sc adcfitionarinfomiation in conjunction wi t h the 



dir e ctory informa t ion, such as a designat e d subs et of cache address information, proc e sso r node 
id e n t ification^nformation, or program count e r information. 
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